From pixels to patient care: deep learning-enabled pathomics signature offers precise outcome predictions for immunotherapy in esophageal squamous cell cancer

Background Immunotherapy has significantly improved survival of esophageal squamous cell cancer (ESCC) patients, however the clinical benefit was limited to only a small portion of patients. This study aimed to perform a deep learning signature based on H&E-stained pathological specimens to accurately predict the clinical benefit of PD-1 inhibitors in ESCC patients. Methods ESCC patients receiving PD-1 inhibitors from Shandong Cancer Hospital were included. WSI images of H&E-stained histological specimens of included patients were collected, and randomly divided into training (70%) and validation (30%) sets. The labels of images were defined by the progression-free survival (PFS) with the interval of 4 months. The pretrained ViT model was used for patch-level model training, and all patches were projected into probabilities after linear classifier. Then the most predictive patches were passed to RNN for final patient-level prediction to construct ESCC-pathomics signature (ESCC-PS). Accuracy rate and survival analysis were performed to evaluate the performance of ViT-RNN survival model in validation cohort. Results 163 ESCC patients receiving PD-1 inhibitors were included for model training. There were 486,188 patches of 1024*1024 pixels from 324 WSI images of H&E-stained histological specimens after image pre-processing. There were 120 patients with 227 images in training cohort and 43 patients with 97 images in validation cohort, with balanced baseline characteristics between two groups. The ESCC-PS achieved an accuracy of 84.5% in the validation cohort, and could distinguish patients into three risk groups with the median PFS of 2.6, 4.5 and 12.9 months (P < 0.001). The multivariate cox analysis revealed ESCC-PS could act as an independent predictor of survival from PD-1 inhibitors (P < 0.001). A combined signature incorporating ESCC-PS and expression of PD-L1 shows significantly improved accuracy in outcome prediction of PD-1 inhibitors compared to ESCC-PS and PD-L1 anlone, with the area under curve value of 0.904, 0.924, 0.610 for 6-month PFS and C-index of 0.814, 0.806, 0.601, respectively. Conclusions The outcome supervised pathomics signature based on deep learning has the potential to enable superior prognostic stratification of ESCC patients receiving PD-1 inhibitors, which convert the images pixels to an effective and labour-saving tool to optimize clinical management of ESCC patients. Supplementary Information The online version contains supplementary material available at 10.1186/s12967-024-04997-z.


Introduction
Esophageal cancer ranked seventh in incidence and sixth in mortality worldwide [1].Esophageal squamous cell carcinoma (ESCC) is the predominant type, accounting for 90% of all cases in East Asia and Africa [2].Despite the gradual improvement in survival, the 5-year relative survival rate of ESCC patients remains less than 20% [3].
Immunotherapy have revolutionized the treatment schemes across ESCC patients [4].The phase III KEY-NOTE-181 [4] and ATT RAC TION-3 [5] trials have demonstrated better efficacy and tolerable adverse effect with immune checkpoint inhibitors (ICIs) in advanced ESCC patients compared with conventional chemotherapy.It should be noted that the clinical benefit of ICIs was limited to only a small portion of ESCC patients, and a subset of patients might experience rapid tumor progression after receiving ICIs [5].This suggests the importance to identify biomarkers to predict which patients could benefit from ICIs.
Expression of PD-L1 is now currently considered as a predictive marker for ICIs [6].Nonetheless, the predictive value of PD-L1 status in ESCC is still controversial [7].Checkmate 648 indicated that patients with PD-L1 of 1% or higher was associated with a significant progression-free survival (PFS) benefit after receiving ICIs [8].However, ESCORT-1st trial indicated no significant correlation between PD-L1 status and efficacy of camrelizumab in ESCC patients [9].It follows that single biomarker could not adequate for the accurate prediction of outcomes of PD-1 inhibitors.
Pathology, which has traditionally been employed as the basis of diagnosis, is the cornerstone of modern medicine and cancer care.Moreover, the pathology of tumor could reflect the heterogeneous characteristics of the tumor microenvironment (TME), and has been found to have the ability to prognosis prediction [10].The development of the digital slide scanners advanced whole slide images (WSIs) from pathological slides which are highresolution panoramic images contains cell structure and stroma.Employing the rich information of WSIs, computational pathology provided insights into the TME and facilitate computer-assisted diagnostics to alleviated the labour-intensive efforts of pathologists [11].
In the new era of artificial intelligence oncology, deep learning-based pathology can not only assist in image classification tasks, but also prognosis prediction by extracting risk-related histopathological features to identify intricate patterns and biological characteristics [12].Jiang et al. performed a GC-SVM classifier using immunomarkers in immunohistochemistry staining slices and demonstrated the ability to predict the adjuvant chemotherapy benefit of gastric cancer patients [13].Besides, a convolutional neural network-based classifier based on H&E images was also demonstrated to perform well in prognosis prediction of stage III colon cancer patients [14].Although these prognosis prediction studies achieved promising performance using pathological images, the important role of pathomics to predict the clinical benefit from immunotherapy was largely unknown.Thus, we aimed to perform a deep learningbased pathomics signature using ViT-RNN network to accurately predict the clinical benefit of immunotherapy in ESCC patients.

Patient cohorts and data resource
ESCC patients receiving PD-1 inhibitors between 1 January 2018 and 1 January 2023 at Shandong Cancer Hospital and institute were included in this study.The inclusion criteria were as follows: (1) pathologically and radiological diagnosed as esophageal squamous cancer patients; (2) receiving PD-1 inhibitors; (3) with access to survival follow-up data.Patients with another primary malignant neoplasms were excluded from further analysis.The baseline characteristics were collected, including age, gender, smoking history, drinking history, TNM stage, metastasis and radiotherapy.Besides, the H&Estained histological specimens of included patients were collected.The WSI images of H&E-stained histological specimens from included patients were scanned using Pannoramic MIDI II, Pannoramic SCAN II scanner or Zeiss Axio Scan.Z1.All images were saved as TIF files in pathological dataset, and were randomly divided into training (70%) and validation (30%) sets.The detailed clinical characteristics and survival time of the cohort are also retrieved.Immunohistochemistry staining for PD-L1 was performed on FFPE tumour tissue using PD-L1(22C3) monoclonal antibodies.The expression of PD-L1 was measured by tumor proportion score (TPS), which is defined as the percentage of membrane-positive tumor cells in all tumor cells.

Label generation
The primary endpoint of the study was PFS of included patients after receiving PD-1 inhibitors, which was defined as the time from the beginning of PD-1 inhibitors to the disease progression or death.And the secondary endpoint was overall survival (OS) of included patients after receiving PD-1 inhibitors, which was defined as the time from the beginning of PD-1 inhibitors to the death.The PFS time of the cohort was used as labels for further model training.In this study, the survival prediction from PD-1 inhibitors was formulated as a classification problem.

WSI images pre-processing
After digitization, the WSI images were pre-processed for further patch-level and patient-level model training.We split the WSI into small patches of 1024 × 1024 pixels at 20 × magnification.To eliminate unnecessary white background in the further process, we selected patches with variable of pixels more than 500.An additional challenge of these images was the stain color distribution differed from WSIs due to the complex staining process, which we chose to address slide level color normalization using Macenko method.

Patch-level model training
Firstly, the ViT_base_patch_16_384 architecture which has been pretrained in ImageNet dataset was used for patch-level model training.The input data were the patch images obtained from splitting the WSI, and the labels of patches were the same as the WSI image of respective patient.In order to reduce the influence of noise and prevent overfitting, symmetric cross-entropy was applied to calculate the loss.Adam optimizer was used as the optimizer algorithm with an initial learn rate of 0.0001, weight decay of 0.0001 and a 50-image batch size.The patch level model was trained for 50 epochs.

Prediction probability distribution of patches in patient level
To obtain the patient-level probability distribution of patches, the softmax output vectors were used to train a linear classifier.All patches from the WSI image of a single patient were summed together, and projected into probabilities using the linear classifier.The patches were ranked by their prediction probabilities, and the top 100 patches with highest probabilities were selected and assigned to the patient.Then a feature extractor would be trained for patches selection.

Training and validation of patient-level pathomics signature
The 40 most suspicious patches of each WSI image are sequentially passed to the RNN for features integration and final patient-level prediction.The cross-entropy was used to calculate the loss of RNN model, and Adam optimizer was used as the optimizer algorithm with a batch size of 2. The ESCC-pathomics signature (ESCC-PS) was constructed based on the patient-level ViT-RNN.All patients were assigned to three groups according to the ESCC-PS, and the accuracy rate were calculated to evaluate the performance in validation cohort.Univariate and multivariate survival analysis was performed to confirm the predictive effect of ESCC-PS in validation cohort.The whole process of this study was shown in Fig. 1.

Incremental value of ESCC-PS for expression of PD-L1
The expression of PD-L1 was evaluated using immunohistochemical stains.And the cut-off values of PD-L1 was determined by receiver operating characteristic (ROC) curve in training cohort, in order to divide patients into high and low expression group.The predictive effect of PD-L1 was evaluated by cox regression analysis.ESCC-PS incorporating expression of PD-L1 for the outcome prediction of PD-1 inhibitors were applied to determine the incremental value of ESCC-PS.Patients in validation cohort were divided into low-risk, medium-risk and high-risk group based on the incorporation of ESCC-PS and PD-L1.The performance of PD-L1, ESCC-PS and incorporation signature were assessed and compared by C-index in validation cohort.

Statistical analysis
The construction of ESCC-PS was performed using Python 3.6.And SPSS 26 and R 4.3.1 were used to conduct the data analysis and visualization.Kaplan-Meier survival was carried out to verify the clinical significance of ESCC-PS.Multivariable analysis was conducted using Cox proportional hazards modeling to validate the predictive value of ESCC-PS.Interactions between ESCC-PS and patient characteristics were detected by χ2 test.ROC curve with area under the curve (AUC) was performed to compare the performance between PD-L1, ESCC-PS and incorporation signature.All tests were 2-sided, and P < 0.05 was considered to indicate statistical significance.

Clinical characteristics of patients with ESCC receiving immunotherapy
There were 163 ESCC patients receiving PD-1 inhibitors from Shandong Cancer Hospital with baseline data and known outcomes included in analysis.324 WSI images of H&E-stained slides were retrieved from these patients, and were randomly divided into training cohort and validation cohort with the proportion of 7:3.The patches from a single patient were divided into the same cohort.There were 120 patients with 227 images in training cohort and 43 patients with 97 images in validation cohort.The clinical characteristics of patients in this cohort were illustrated in Table 1, and no significant difference was observed between training and validation cohort.After the median follow-up time of 14.2 months, the median PFS was 7.9 months for the whole patients.The optimal cut-off of PD-L1 expression was set as 57.5% with the AUC of 0.632 in training cohort.

The training of ViT-RNN survival model and construction of ESCC-PS
There were 486,188 patches with 1024 × 1024 pixels after image pre-processing from 324 WSI images.Then subsampling was used to resize these patches  2, no potential interactions were found between ESCC-PS and patient characteristics in validation cohort.

Test performance of ESCC-PS for outcome prediction of PD-1 inhibitors in validation cohort
The ViT-RNN based ESCC-PS achieved the accuracy of 92% in training cohort, and 84.5% in validation cohort.Four iterations of random partition of all patients were performed to investigate the stability of the model, and the accuracies of the model ranged from 82.7% to 95.1% in the validation cohort, indicated the stability of the model.The Kaplan Meier survival curves indicated the significant difference on the PFS (P < 0.001) with the median PFS of 2.7, 4.8 and 16.7 months respectively between ESCC-PS 1, ESCC-PS 2, and ESCC-PS 3 group (Fig. 2A).The superiority in OS (P < 0.001) was found for patients in ESCC-PS 3 group (Unreached) compared to ESCC-PS 1 (6.3 month) and 2 group (20.2 month) (Fig. 2B).And the predictive effect of ESCC-PS (P < 0.001) was also shown according to the  3, the expression of PD-L1 was also associated with the PFS of ESCC patients receiving immunotherapy (HR = 0.486,   S1 and S2).

Incremental value of the ESCC-PS added to the expression of PD-L1 for outcome prediction
The incorporation of ESCC-PS and expression of PD-L1 was performed based on the multivariate cox regression analyses to elucidate the incremental value of the ESCC-PS added to the expression of PD-L1 for predicting the outcome of PD-1 inhibitors.As shown in Fig. 3, survival analysis indicated the incorporation signature could significantly distinguish patients into high-risk, mediumrisk and low-risk, with the PFS of 2.6, 4.5, 12.9 months, and OS of 6.3, 20.2, 34.8 months, respectively (Table 4) The C-index of ESCC-PS for PFS after receiving PD-1 inhibitors was 0.806, which was significantly higher than that of the expression of PD-L1 (0.601, P < 0.001).As shown in Table 5, the significantly incremental C-index was observed for incorporation of ESCC-PS and PD-L1 compared with PD-L1 alone (0.814 vs 0.601, P < 0.001).The comparison of ROC for prediction of PFS and OS between PD-L1, ESCC-PS and ESCC-PS + PD-L1 was shown in Fig. 4. The AUC of ESCC-PS + PD-L1 for 6 month-(0.904vs 0.610, P < 0.001) and 12 month-PFS (0.868 vs 0.679, P = 0.099) prediction was higher than PD-L1.Similarly, ESCC-PS + PD-L1 also exhibited higher AUC for prediction of 12 month-(0.901vs 0.643, P < 0.001) and 18 month-OS (0.883 vs 0.626, P < 0.001).

Discussion
Herein, we constructed a computational pathomics signature named ESCC-PS using H&E stained WSI images based on outcome supervised ViT-RNN to predict the survival of ESCC patients receiving PD-1 inhibitors.Validation experiments confirmed the excellent performance and independent predictive effect of this ESCC-PS, and the incremental value for the expression of PD-L1 for outcome prediction of PD-1 inhibitors.
The microscopic study including H&E-stained histopathological images was the cornerstone for cancer diagnosis and prognosis.However, the histological identificantion by pathologists faces challenges including the heterogeneity of tumor.Tumor with the same histology can develop in different prognosis, and tumor with different histology can develop in the same process.Some intrinsic pathological features which cannot be recognized by human eyes might have a greater impact on the development and prognosis of tumors.Besides, the extremely large spatial size of WSIs makes it difficult to extract hand-crafted features.
Deep learning, the state-of-the-art technique in computer vision, have presented the potential to automatically identify and analyze high throughout features in WSI, which were not limited to hand-crafted features from existing knowledge.The deep learning pathomics can be used for objective diagnosis, phenotype recognition and prognostic prediction, and realize the pathological pixel to patient care [15][16][17].Shi etc. has established  an interpretable pathomics model using Resnet-18 and quantified "tumor risk score (TRS)" to predict the clinical outcomes of hepatocellular carcinoma patients [18].Qaiser T, et al. constructed a weakly supervised survival convolutional neural network (WSS-CNN) approach equipped with a visual attention mechanism based on WSI images of H&E-stained specimens, and demonstrated its outperformed performance for outcome prediction of lung cancer patients with the C-index of 0.6863 [19].
In this study, the ESCC-PS based on WSI images of H&E-stained specimens were constructed for outcome prediction of PD-1 inhibitors, and the deep neural network for model training is vision transformer embedded self-attention mechanism, with image data as the input rather than manual feature extraction.Previous study has demonstrated the superior performance of ViT pretrained on Image-Net dataset, which was used in this study [20], compared to CNN.The output of survival prediction model was risk stratification by splitting the survival time into three discrete groups.The patches from WSI images including positional embedding were as the input of transformer encoder [21], which incorporated more spatial information at lower layers.And the self-attentional mechanism allows ViT to capture the global features of the image rather than the dependencies between adjacent elements.This pathomics signature using ViT-RNN prognostic model is expected the identify ESCC patients who might benefit from PD-1 inhibitors, and assist in the development of individual treatment strategies.
H&E-stained specimens were also found to characteristic TME to some extent based on deep learning, which could dig out more information for prognosis prediction [22,23].Jiao etc. has performed the CNN to recognize the stroma, tumor, necrosis, and lymphocyte components of TME from colon cancer H&Estained specimens, and survival analysis also indicated the prognostic value of them in colorectal cancer [23].The WSI of H&E-stained specimens contains the delicate details of the tissue.Deeep learning based pathomics could be a useful tool for the data mining including the features of cells, intercellular junction and others, which are more suitable for risk stratification and prognosis prediction and might be the reason for the outcome prediction of ESCC-PS.However, the interpretability and the predictive biological features for ESCC-PS model need further investigated in future studies.
Despite limited performance, expression of PD-L1 remains the cornerstone for predicting the outcome of ESCC patients receiving PD-1 inhibitors [24], which was also demonstrated in this study.Besides, we discovered the superiority performance of ESCC-PS compared to the expression of PD-L1.As the ESCC-PS was derived from the H&E-stained sections, which was routinely performed in the clinic, ESCC-PS might be conveniently applied without additional financial burden in clinical practice.In addition, significantly improved predictive performance was detected when the ESCC-PS was added to the expression of PD-L1, with the improvement in C-index ranged from 0.659 to 0.819.The result indicated the additional biological  information from the ESCC-PS, and might favour the personalized application of PD-1 inhibitors.There were still some limitations in this study.Firstly, further validation of this pathomics prognostic model was needed in external cohort.Besides, the black box feature of deep learning makes it difficult to explore the biological mechanism of ViT-RNN model for outcome prediction.The feasible way to explore the underlying mechanism of ESCC-PS with PD-1 inhibitors benefits needs further investigation.In addition, multi-omicsbased outcome prediction might achieve better performance including but not limited to pathomics, radiomics and genomics.
In conclusion, we developed and verified a pathomics model named ESCC-PS based on ViT -RNN framework from WSIs.The ESCC-PS could act as an excellent predictor, and played complementary role of PD-L1 for outcome prediction of PD-1 inhibitors, which could aid the clinical decision making.
into 384*384 pixels to adapt to pretrained ViT_base_ patch_16_384.All patches were used as input to ViT for patch-level model training, then were projected into group probabilities.The 40 most predictive patches are sequentially passed to the ViT-RNN for final patient-level prediction and construction of ESCC-PS.Patients in validation cohort were projected into three groups based on ESCC-PS, and 19 patients were projected in ESCC-PS 1 group, 13 patients in ESCC-PS 2 group and 11 patients in ESCC-PS 3 group.As shown in Table

Fig. 1
Fig. 1 The overall workflow of ESCC-PS construction.A The process of sample splitting.B Schematic illustration of histopathology image processing and ESCC-PS construction based on ViT-RNN.ESCC-PS esophageal squamous cell cancer-pathomics signature, ViT-RNN Vision Transformer-Recursive Neural Network

Fig. 2
Fig. 2 Kaplan-Meier survival curves according to the ESCC-PS in validation cohort.A Kaplan-Meier survival curves of PFS (2.7 vs 4.8 vs 16.7 months, P < 0.001) according to the ESCC-PS.B. Kaplan-Meier survival curves of OS (6.3 vs 20.2 months vs Unreached, P < 0.001) according to the ESCC-PS.PFS progression-free survival, OS overall survival, ESCC-PS esophageal squamous cell cancer-pathomics signature

Fig. 3
Fig. 3 Kaplan-Meier survival curves according to the incorporation of ESCC-PS and PD-L1 in validation cohort.A Kaplan-Meier survival curves of PFS (2.6 vs 4.5 vs 12.9 months, P < 0.001) according to the incorporation of ESCC-PS and PD-L1.B. Kaplan-Meier survival curves of OS (6.3 vs 20.2 vs 34.8 months, P < 0.001) according to the incorporation of ESCC-PS and PD-L1.PFS progression-free survival, OS overall survival, ESCC-PS esophageal squamous cell cancer-pathomics signature

Table 1
Baseline characteristics of patients in training cohort and validation cohort

Table 2
Correlation analysis between ESCC-PS and patient characteristics

Table 3
Univariate and multivariate cox regression analysis of ESCC-PS and clinicopathological characteristics for progression-free survival in validation cohort

Table 4
Univariate and multivariate cox regression analysis of ESCC-PS and clinicopathological characteristics for overall survival in validation cohort

Table 5
Comparison of C-index for progression-free survival in validation cohort The comparison of ROC curves for survival of PD-L1, ESCC-PS and ESCC-PS + PD-L1 in validation cohort.A The comparison of ROC curves for evaluating 6-month PFS (AUROC: 0.610, 0.924 and 0.904).B The comparison of ROC curves for evaluating 12-month PFS (AUROC: 0.679, 0.857 and 0.868).C The comparison of ROC curves for evaluating 12-month OS (AUROC: 0.643, 0.886 and 0.901).D. The comparison of ROC curves for evaluating 18-month OS (AUROC: 0.626, 0.838 and 0.883).ROC receiver operating characteristic, AUROC area under ROC curve, PFS progression-free survival, OS overall survival, ESCC-PS esophageal squamous cell cancer-pathomics signature